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ABSTRACT 

We develop a method to extract the shape information of line profiles from discrete 
kinematic data. The Gauss-Hermite expansion, which is widely used to describe the 
line of sight velocity distributions extracted from absorption spectra of elliptical galax- 
ies, is not readily applicable to samples of discrete stellar velocity measurements, ac- 
companied by individual measurement errors and probabilities of membership. These 
include datasets on the kinematics of globular clusters and planetary nebulae in the 
outer parts of elliptical galaxies, as well as giant stars in the Local Group galaxies 
and the stellar populations of the Milky Way. We introduce two parameter families of 
probability distributions describing symmetric and asymmetric distortions of the line 
profiles from Gaussianity. These are used as the basis of a maximum likelihood estima- 
tor to quantify the shape of the line profiles. Tests show that the method outperforms 
a Gauss-Hermite expansion for discrete data, with a lower limit for the relative gain 
of w 2 for sample sizes N ss 800. To ensure that our methods can give reliable de- 
scriptions of the shape, we develop an efficient test to assess the statistical quality of 
the obtained fit. 

As an application, we turn our attention to the discrete velocity datasets of the 
dwarf spheroidals (dSphs) of the Milky Way. Sculptor and Fornax have datasets of 
> 1000 line of sight velocities of probable member stars. In Sculptor, the symmet- 
ric deviations are everywhere consistent with velocity distributions more peaked than 
Gaussian. In Fornax, instead, there is an evolution in the symmetric deviations of 
the line profile from a pcakicr to more flat-topped distribution on moving outwards. 
Although the datasets for Carina and Sextans are smaller, they still comprise several 
hundreds of stars. Our methods are sensitive enough to detect evidence for velocity 
distributions more peaked than Gaussian. These results suggest a radially biased or- 
bital structure for the outer parts of Sculptor, Carina and Sextans. On the other hand, 
tangential anisotropy is favoured in Fornax. This is all consistent with a picture in 
which Fornax may have had a different evolutionary history to Sculptor, Carina and 
Sextans. 
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1 INTRODUCTION 

Our ability to uncover the elusive properties of dark matter 
in galaxies is based on the analysis of velocities of stars. For 
the case of pressure-supported systems like elliptical galax- 
ies, a great advantage is obtained by considering the prop- 
erties of the entire line profile - that is, the shape of the 
line of sight velocity distributions Sf(v) - rather than just 
the first two velocity moments. This helps break the perni- 
cious mass-anisotropy degeneracy, which otherwise enables 
dark matter mass to be traded against velocity anisotropy 
at both small and large radii, and hence hidden away. 
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For elliptical galaxies, higher order velocity information 
can be extracted from absorption line spectra. The shape 
of the velocity distributions is usually quantified w ithin the 
frame work of a Gauss-Hermite series, intr oduced in lGerhardl 
l| 199311 and Ivan der Marel fc Framd (| 19931 ). Given the veloc- 
ity distribution Ji?(v), the associated Gauss-Hermite series 
is defined by the relation 



JS?(«) = 



\/2ii 



=exp 



v — [i 



n 



(1) 

in which the parameters 7, /j, and a identify respectively the 
normalization, mean and standard deviation of the best fit- 
ting Gaussian, while the coefficients hi specify the shape in- 
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formation. The advantage of this formalism is that Hermite 
polynomials Hi are orthonormal with respect to a Gaus- 
sian weight function. The (lowest order) Gauss-Hermite mo- 
ments measure structure in the central parts of the velocity 
distribution and have a limited dependence on the poorly- 
determined tails. 

However, a disadvantage of the Gauss-Hermite formal- 
ism is that it cannot be easily applied to the large class of 
problems in which the kinematic observations come in the 
form of discrete velocity measurements, rather than as line of 
sight velocity distributions extracted from absorption spec- 
tra. This includes the modelling of the dynamics of galaxies 
at large radii, where integrated-light spectroscopy is not pos- 
sible because of low surface brightness. Here, tracers such as 
globular clusters and planetary nebulae are used to probe 
the realm of dark matter (e.g., [R omano wskv et al.l 2003 ; 
Coccato et alj 120091 ; iNapolitano et al.l l201ll ; iDeason et al.l 
2012ft . In the Local Group, ground-based observations of 
nearby dwarf spheroidal galaxies and globular clusters have 
enabled impressive datasets of up to thousan ds of individ- 
ual li n e of s i ght velocities to be gath e red fe.g..|Klevna et al.l 



20021. |2004 IWilkinson et all 12004 iBattaglia et al 



Reiins et a l. 2006; Battaglia et al.ll2008l : IWalker et al.l 



2006 



2009 



2010ft . Clusters of galaxies provide another example in which 



the k inematic information i s available as discrete v elocities 
fe.g.. lLokas fe Mamonll2003l ; IWoitak fc LokasH2010t ). 

For such datasets, the Gauss-Hermite formalism has 
shortcomings that limit both accuracy and precision. Diffi- 
culties are mainly connected with the heterogeneous obser- 
vational uncertainties and the probabilities of membership. 
Addit ionally, a s traigh tfor ward implementation of the me th- 
ods of lGerhardl l| 1993ft and Ivan der Marel fc Frarixl || 1993ft is 
only possible for continuous data, thus introducing the po- 
tentially important loss of information due to data binning. 
In general, a dataset of discrete velocities V = {vi , • • • , vn} 
comes together with a set of observational uncertainties 
A = {Si, ■ ■ ■ , Sn}, the values of which are usually inhomoge- 
neous. Also, the different kinematic tracers may have differ- 
ent probabilities of membership, P — {pi, ■ ■ ■ ,pjv}, which 
should also be included. It is difficult to account properly for 
this information within the framework of a Gauss-Hermite 
series, as the binning procedure erases virtually everything 
except V. For this reason, the accuracy of the results ob- 
tained can be seriously diminished if the observational un- 
certainties are large (in terms of the intrinsic dispersion 
(Tint), and/or if either the observational uncertainties them- 
selves or the membership probabilities are highly inhomoge- 
neous. 

Furthermore, whatever the sample size N, the shape 
measured within the Gauss-Hermite framework is not the 
shape of the intrinsic velocity distribution jSf itself, but 
rather the shape of its convolution with the uncertainties' 
kernel, often taken to be Gaussian. This causes an attenua- 
tion of the signal due to the intrinsic deviations from Gaus- 
sianity in S£ . The magnitude of this attenuation needs to be 
separately quantified and then simulated on models before 
direct comparison with the observables, which is a lengthy 
procedure. 

Given these difficulties, it is clear that a feasible solution 
might be to use Bayesian methods, which naturally allow us 
to include all available information, such as uncertainties of 
any origin, as well as probabilities of membership. Unfortu- 



nately, the Gauss-Hermite series is not always positive defi- 
nite and so does not itself define proper probability distribu- 
tions. In order to implement a maximum-likelihood method, 
we will have to introduce a new suitable parametrization. At 
first glance, this may be viewed as a limitation of a Bayesian 
framework, since any parametric family of velocity distribu- 
tions may not be flexible enough to describe the dataset. 
However, we put in place an analytic device that allows us 
to test directly whether the description of the observational 
sample that is recovered within a parametric family is a good 
statistical description or not. Hence, it is always possible to 
identify whether the adopted parametrization is suitable. 

As an application for our new methods, we turn 
our attention to the highly dark matter dominated dwarf 
spheroidal galaxies (dSphs) of the Milky Way. Here, the 
goal is to map out the density distribution of the dark mat- 
ter and compare it to the theoretical predictions of hier- 
archical cosmologies. Sometimes, as in the nearby Sculp- 
tor dSph, the properties of the dark halo profile have been 
strongly constrained by exploiting the fortunate coexistence 
of multiple stell ar populations, having different metallicities 
and kinematics (Battaglia et al. 2008; Walker & Pcn arrubial 
l201ll ; lAmorisco fc EvandbOl^ whilst Ide Boer et al.1 (|2012ft 
have mapped out the detailed star formation history. 
IJardel fc GebhardtJ (|2012ft have recently shown for the For- 
nax dSph, that even with just one (perhaps composite) stel- 
lar population, the detailed modelling of the velocity distri- 
butions may be able to constrain tightly the mass profile. 
A key ingredient here is the use of the shape information 
of the line profiles in addition to the second moments fa- 
miliar from straightforward Jeans equ ation modelling (e.g. 
iGilmore etlfll2007l ; IWalker et al.ll2010ft . By themselves, the 
Jeans equations do not provide enough information to per- 
mit the da rk matter distribu tion to be mapped out unam- 
biguously |Evans et al.ll2009ft . 

We use our new methods to analyze the discrete ve- 
locity datasets of four dSphs - Sculptor, Carina, Sextans 
and Fornax - and obtain for the first time detailed mea- 
surements of the higher velocity moments. This information 
provides powerful observables for future dynamical analy- 
ses of the dSphs, and will help constrain the mass profiles 
in these systems. Also, since the the formation history of 
dSphs is mirrored in their current orbital structure, detailed 
information on the line profiles will identify and constrain 
feasible formation mechanisms. 

The plan of the paper is as follows. In Section we 
investigate the magnitude of different effects that influence 
any velocity distribution measurement, such as limited sam- 
pling, observational uncertainties and - for line of sight dis- 
tributions - apparent rotation due to systematic proper mo- 
tion. In Section[3] we construct suitable two-parameter fam- 
ilies of distributions to use in a Bayesian likelihood. We de- 
scribe the method through which we control the statisti- 
cal meaningfulness of the maximum likelihood fit. Section [4] 
deals with the application of the maximum likelihood mea- 
surements of the higher velocity moments to the dSphs. 



2 ACHIEVABLE ACCURACY 

Reconstructing the intrinsic velocity distribution Sf(v) of 
a stellar system is a complex task, since several different 
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Figure 1. Accuracy limits from limited sampling. Upper panel: 
the evolution of the standard deviation of the estimated (normal- 
ized) mean fj, (full line) and dispersion a (dashed line) with the 
sample size N. In red, the representative behaviour of 1/yN is 
shown. Lower panel: the evolution of the standard deviation of 
the Gauss-Hermite moments /13 (full line) and /i4 (dashed line) 
with the sample size N. 



effects can modify the signal that we actually observe. The 
purpose of this Section is to quantify these contributions. 



2.1 Intrinsic noise from limited sampling 

The most obvious difficulty in measuring the shape of the 
velocity distribution in astrophysical systems is limited sam- 
pling. Finite samples of velocities naturally introduce some 
noise, which may be able to alter completely the intrinsic 
shape. 

In general, the magnitude of the noise has its strongest 
dependence on the sample size N . Hence, the number of 
available kinematic tracers poses a limit to the level of 
achievable accuracy. Some, smaller dependence can also be 
ascribed to the method we use to measure such a shape. 
In this Section, we use the Gauss-Hermite expansion intro - 
duced in lGerhardl (|l993t ) and lvan der Marel fc Franxl (|l993h . 
Later, we will compare these results to our maximum likeli- 
hood method. 

As a representative case, we assume that the intrinsic 
velocity distribution Jzf is a perfect Gaussian, Sf(/j,i n t, 0mt), 
where /ij nt and <jj nt are its intrinsic mean and disper- 
sion. We generate synthetic samples of size N, namely 
V = {«!,••• ,vm}, which share the distribution izf(w) = 
<S (/lint, Omt)- For each of them, we measure the standard set 
of properties: (fj,, a, /13, hi), namely the estimated mean, dis- 
persion, and the first non-zero Gauss-Hermite moments, /13 
and /i4- 

For the sake of clarity, the pair a) identifies the best 
Gaussian fit ^(n, cr) to the binned dataset V (we do not 
report results related to the much less interesting normal- 
ization of the Gau ssian fit). The Gauss-Hermite moments 
are computed as in Ivan der Marel fc Franxl l| 19951 ). hence hi 
and /12 are identically zero. We adapt the size of the bin Sbm 



to the size of the sample N with the standard prescription 
Sbm oc A -1 / 3 . The bins are centered on the sample mean, 
and, as a reference, for our smallest sample size N — 25 we 
use ss 4 bins within the interval (— a, a). 

For any sample size, we quantify the noise due to lim- 
ited sampling by repeating this measurement procedure on 
a large number of synthetic samples. Fig. [T] shows as a func- 
tion of sample size N, the variation in the standard devia- 
tion (StD) of the estimated (normalized) mean fj,/<Ti n t, (nor- 
malized) dispersion a/aint, and Gauss-Hermite moments /13 
and /14. As expected, StD^/crjnt) and StD(a/(Ji n t) follow 
approximately the reference prescription StD~ 1 / y~N. Note 
though, that while StD(a /a- lnt ) < StD(/i/cri nt ), it does not 
achieve the statistical prescription StD(a) ~ l/y2N. More 
significant, however, is the result for the Gauss-Hermite mo- 
ments. We find that, up to a sample size of N — 200, 
the magnitude of the noise (StD(/i3) S3 StD(/i4) S3 0.05) is 
higher than the amount of intrinsic signal that one, in gen- 
eral, can typically expect to find in galactic astronomy. This 
result is approximately independent of the intrinsic shape 
of the velocity distribution: we find that the accuracy lim- 
its quantified here remain substantially unchanged for syn- 
thetic datasets extracted from non-Gaussian distributions. 
As a consequence, for sample sizes less than 200, the accu- 
racy of any measurement is potentially very low, which casts 
doubts on the reliability of results obtained using just a few 
tens of tracers. 

In fact, the situation is even worse, as the intrinsic signal 
in _Sf is also attenuated by the observational uncertainties 
on the discrete kinematic measurements. Thus, it is highly 
likely that any deviation from Gaussianity detected in small 
samples is an artefact of under-sampling and/or binning, 
rather than being real. We conclude that it is extremely dif- 
ficult to measure reliably the shape of any velocity distribu- 
tion Jzf with a sample size that is significantly smaller than 
N = 200. In the following, we will only consider samples 
with N > 200. 



2.2 Signal attenuation by observational 
uncertainties 

Inevitably, any real dataset V has its own set of observa- 
tional uncertainties A = {Si, ■ ■ ■ , Sm}- Their effect is to al- 
ter the observed velocity distribution, so that the i-th star is 
in fact associated with the velocity distribution Jzf *5f(0, Si), 
rather than with the intrinsic J? itself. By Jzf * 9 , we indi- 
cate the convolution of the velocity distribution Jzf with the 
Gaussian kernel C S: 



[JSf. 



exp 



dx Jzf (a:) 



1 / v — X \ 

2 1 ^ ) 



(2) 



Unsurprisingly, the effect of this convolution is to attenuate 
the features of Jzf . 

For a given velocity distribution Jzf(u), the magnitude 
of this attenuation is a function of the dimensionless ratio 
observational uncertainty to the intrinsic dispersion 5/c>i n t 
only. For any sample size N, it is impossible to resolve the 
intrinsic velocity distribution Jzf with the Gauss-Hermite 
method used in previous Section \2. II Rather, the measured 
signal is the one corresponding to Jzf * Sf(0, 8 m ), where 8 m 
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Figure 2. The effect of observational uncertainties. Left panels: 
the black full curves show the evolution of Jzf * W for increas- 
ing levels of observational uncertainty. Jzf is the intrinsic veloc- 
ity distribution displayed in red. The adopted values of 5/oi nt 
are {0.2, 0.5, 1}. Right panels: the corresponding evolution of the 
Gauss-Hcrmitc moments /13 (full line) and /14 (dashed line). Blue 
vertical lines illustrate the level of observational uncertainty, from 
left to right , of the Fornax , Sculptor, Sextans and Carina datasets 
from lWalker et al.l l |200gh . 



is, approximately, the mean of the sample of uncertainties 
A. 

As an example, let us consider two arbitrary velocity 
distributions jSf (t?) both sharing the same amount of non- 
Gaussianity as characterized by their Gauss-Hermite expan- 
sion: the first has (/i3,/i4) = (—0.07,-0.1), and the second 
has (/i3,/i4) = (0.07,0.1). They are displayed as red full 
curves in the upper and lower left panels of Fig. [5] The 
right panels show the evolution of the Gauss-Hermite mo- 
ments /13 (full line) and /14 (dashed line) as functions of 
observational uncertainty on the kinematic measurements 
S faint- Clearly, these are monotonic functions that tend to 
zero for both Gauss-Hermite moments - that is to a perfect 
Gaussianity - when 5/aint > 1 or when the uncertainty is 
high enough to overwhelm any signal in Jz? . The black curves 
in the left panels illustrate this process by showing Jzf * 
when 5/<Tint 6 {0.2,0.5,1}. 

From Fig. [2] it is clear that the effect of attenuation 
is significant. The vertical lines in the right panels show 
the levels of average observational uncertainty 5 m /ffixxt for 
th e datasets on the G alactic dwarf spheroidals presented 
bv lWalker etlll l|200gh . Specifically, from the lowest to high- 
est levels of uncertainty, we find 



Fornax : 


S m /Ui n t P 


a 0.22 
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Figure 3. The effect of apparent rotation. Left panel: an in- 
trinsic Gaussian velocity distribution Jz? = W(0, <T int ) (red curve) 
disturbed by apparent rotation of growing amplitude Vapp/oi nt 6 
{0.2,0.5,1,2} (black curves). Right panels: the evolution of the 
dispersion a of the best-fitting Gaussian and of the first nonzero 
Gauss-Hermite moment /14 for different Vapp/oint- 



= 0.1) would remain smaller than (or comparable with) 
the noise from limited sampling up to N pa 100. On the 
other hand, by reversing the argument, if the typical ultra- 
faint has a kinematic sample of N ;$ 100, any signal would 
be overwhlemed by the shot noise unless 5 m /cint ~ 0.25. Al- 
though i mportant improveme nt has been achieved (see for 
example IKqposov et al.ll201lT ). significantly larger datasets 
would be necessary to resolve the line profile information in 
such systems. Finally, let us note that if we compare the 
observed shape of jf * with that of theoretical dynami- 
cal models, we must account for this inevitable attenuation 
effect. Whether using a Gauss-Hermite expansion or a non- 
param etric reconstruction of the velocity distribution Jf(v) 
- as m IJardel fc GebhardtJ |2012l) - this attenuation must be 
applied to the dynamical models themselves before compar- 
ison with the observables. On the other hand, a maximum 
likelihood approach allows us to take into account the ob- 
servational uncertainties while deriving our observables, and 
thus we are able to reconstruct the properties of Jz? itself, 
rather than those of Jz? * 'S . 



2.3 Apparent rotation from global proper motion 

The exploitation of the projection effect that causes an ex- 
tended object in the sky to have an apparent line-of-sight 
solid-body rotation as a conse quence of its glob al proper 



ICQ. 

motion has a long history (e.g.. iFeast et al. 1961). Very re - 



Taking the case of Sculptor as an example and consulting 
Fig. [1] it is clear that even the strong signal adopted here 



cently, the effect has been exploited by I Walker et all (|2008l ) 
to derive the systemic proper motion of the Fornax, Sculptor 
Carina and Sextans dSphs. 

For our purposes, apparent rotation is a potentially dan- 
gerous effect because it alters the shape of the observed ve- 
locity distribution. In particular, for the line of sight velocity 
distribution Jz? , the contribution of apparent rotation is de- 
generate with the signal produced by tangential anisotropy. 
For this reason, whenever possible, apparent rotation is usu- 
ally subtracted from the dataset before estimating the shape 
of ££ '. Nonetheless, it is useful to have an understanding of 
this spurious effect, since the subtraction of the apparent 
velocity field from the dataset does come at a price. It may, 
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in fact, not be worthwhile degrading each discrete velocity 
measurement with the uncertainty of the apparent velocity 
field if the effect on Jz? is expected to be exceedingly small. 

For simplicity, we consider the case in which the sample 
of kinematic tracers is uniformly distributed on a circle. This 
assumption mimics the more realistic case of a thin circular 
annuhis. Let us use the notation 



, (R, 0) = c D ft- R = V apP sin(6>-0 a 



(3) 



where jl is the proper motion, D is the distance, R is the 
projected distance on the sky (measured in arcmin), # ap p 
identifies the apparent rotation axis, and c is a constant 
(c = f.379 • 1CP 5 (km century)/(s kpc arcmin mas)). Hence, 
Vapp represents the maximum apparent rotation velocity at- 
tainable at the projected radius R. If Jz? is the intrinsic 
velocity distribution of the tracers on the circle with ra- 
dius R, then the effective velocity distribution we observe is 
^* r(Va PP ), where 



(ttKfWI - «VK| P ) 1 if KKppI < 1 
if |«/K P p| > 1 



(4) 

is the velocity distribution associated with the apparent ro- 
tation. 

The magnitude of the deviations from Jzf introduced 
by the convolution is a function of the ratio Vapp/oi nt 
only. Fig. [3] illustrates the representative case in which 
Jz? = 5f(0,o"int). The left panel illustrates four different ve- 
locity distributions C S * ^(Vkpp), obtained for l^pp/crint G 
{0.2,0.5,1,2}. As anticipated, deviations from the pure 
Gaussian case (in red) are towards a flat topped veloc- 
ity distribution and even a double peaked structure ap- 
pears at higher levels of apparent rotation. Both these fea- 
tures are usually considered indicative of tangential velocity 
anisotropy, though clearly this is not the case here. The pan- 
els on the right quantify the effect of apparent rotation on 
the dispersion of the best fitting Gaussian a and on the 
Gauss-Hermite moment /14. Since both ^ and 'f are even 
functions, hz is identically zero. Both a and /14 are more 
strongly modified as the magnitude of the apparent rota- 
tion increases. 

For the dSphs considered later in this paper, the ef- 
fect is actually very small. Sculptor and Fornax are the only 
dSphs where a reliable non-zero estimate of the rotation sig- 
nal can presently be obtained. Even at large distances R, the 
amount of apparent rotation Vapp (less than a few kms -1 ) 
still remains just a fraction of the intrinsic dispersion <7j nt 
(typically ~ 10 kms" 1 ). Unless we have a very large sam- 
ple size, the modifications introduced by apparent rotation 
are smaller than the achievable accuracy. As Fig. [3] shows, 
this is particularly true for the Gauss-Hermite moment /14, 
which remains virtually unaffected up to V app /ai n t ~ 0.7, 
even with sample sizes as large as iV = 800. 

This is clearly not a general rule, and Fig. [3] can be 
used to assess other cases, such as nearby globular clusters. 
Also, there may be reasons that justify the subtraction of 
the apparent rotation from the kinematic dataset, as for 
example if the considered annuli are highly non-uniformly 
populated, or if other spatial regions are considered in the 
place of annuli, or if the estimate of the apparent velocity 
field is precise enough. 



3 MAXIMUM LIKELIHOOD METHOD 
3.1 Introduction 

Suppose we have a set of iV kinematic tracers with velocities 
V = {vi, ■ ■ ■ ,vm}, which are aligned along the same axis, 
for example, the line of sight direction. We assume that they 
sample a specified spatial region, the velocity distribution 
of which we want to determine. The set V is accompanied 
by the set of velocity uncertainties A = {<5i, ■ • ■ , <5jv}, and 
membership probabilities P — {pi, • • • ,pjv}- We take these 
probabilities as assigned constants, although in some cases 
the probability pi may be modelled as a function of the 
velocity Vi as well as of other observable quantities in order 
to identify foreground objects, separate stellar populations 
and so on. 

Now suppose we have at our disposal a family of ve- 
locity distributions ££{v). This is associated with a set of 
parameters Q — {0i,--- ,6j}. Within this family, we can 
recover the best statistical description of the sample V by 
maximizing the likelihood: 



(5) 



in which we have implicitly assumed that the distributions 
Jz? have unit integral. 

In the case of a Gauss-Hermite expansion, the set O 
comprises the dimensional pair (fi, a) of the best Gaussian 
fit, together with the series of dimensionless moments hj, 
truncated according to the size of the dataset as well as 
to the uncertainties of th e kinematic measurements. N ote 
that in the terminology of Ivan der Marel fe Franxl |l993t ). fj, 
and a represents the mean and dispersion of the best-fitting 
Gaussian. 

In this paper, however, we prefer to use \i and a to 
denote the first and second moment of the distribution Jzf : 



_£?(G;v) v dv 



&{Q;v) v 2 dv 



(6) 



(7) 



We can highlight the role of these two dimensional quantities 
in the likelihood (JSJ by separating them from the remaining 
shape parameters in O, which we group in the subset O s h, 
to yield 



JV 

L( e) = ?l [^(e sh ) * sf(o,*o] (^r 



(8) 



We have made explicit use of the fact that the distributions 
Jz?(O s h;«) have zero mean, unit integral and unit disper- 
sion. Finally, we will use the notation ee for the uncertainty 
of the parameter 9 6 9. This uncertainty is defined by the 
68% confidence region associated with the marginalized like- 
lihood. 

The implementation of a maximum likelihood method 
for measuring the shape of velocity distributions «Sf brings 
about a number of advantages. The most evident one is the 
elimination of any arbitrary aspect introduced by binning in 
velocity space. Of equal importance is the problem of obser- 
vational uncertainties. These are not easily - nor usually - 
taken into account by the binning procedure, hence making 
any measurement questionable. The method of maximum 
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likelihood instead furnishes a natural way to incorporate 
any uncertainty into the measurement procedure. Also, we 
can reconstruct directly the intrinsic velocity distribution 
,Sf, rather than the attenuated one J?f * Sf(0, 8m). This has 
two consequences. First, the limit in accuracy due to sam- 
pling (see Sect. I2.l"|l is less important, since intrinsic signals 
are stronger. Second, observables obtained in this way can 
be directly compared with dynamical models. This is not 
possible in general, since if Jf * &(Q,5 m ) is reconstructed, 
this should be compared with an analogous quantity which 
is only indirectly provided by the models. 

Given these advantages, it is natural to look for an 
implementation of the maximum likelihood approach using 
the standard Gauss- Hermite expansion . Such an approach 
has been proposed in Ivan de Ven et al.l |2006l ) and used in 
iRangwala et al.l (|2009l ) to characterize the kinematics of the 
Galactic bar. Particular attention must be devoted to the 
tails of the velocity distributions, as a truncated Gauss- 
Hermite series is not always positive definite and so cannot 
be used as a probability distribution. In order to circumvent 
these difficulties, we construct alternative families of proba- 
bility distributions ^f(0 s h;«), and it is to this problem we 
now turn. 



3.2 A Two-Parameter Family of Velocity 
Distributions 

For velocity distributions Jzf, it is useful to start with a 
Gaussian profile Sf , since it represents a good approximation 
for most realistic cases. We maintain this perspective, but 
adopt two additional parameters to measure the deviations 
from a pure Gaussian profile. The parameter s quantifies 
the magnitude of the symmetric deviations, the parameter 
a the magnitude of asymmetric deviations. This is in addi- 
tion to the parameters fi and a, which are the mean and the 
dispersion of any distribution «Sf . For the sake of clarity: 



= {n, a} U 6sh = {n, tr, s, a} 



(9) 



As in any parametric approach, the choice of the adopted 
model is a crucial step. In the next subsections, we propose 
families of velocity distributions encompassing a wide range 
of deviations from the Gaussian profile seen in typical stellar 
dynamical systems. 




Figure 4. Our one-parameter family of symmetric velocity dis- 
tributions. The red profile illustrates a perfect Gaussian; dashed 
profiles display negative values of s, ranging in the interval [—4, 0); 
dotdashed profiles display positive values of s ranging in the in- 
terval (0,0.8]. 




3.2.1 Symmetric deviations 

In order to build a set of representative, symmetric, 
non-Gaussian velocity distributions, we exploit the three- 
dimensional distribution of velocities 



f(v r 



v t 



■ exp 



v 2 r + \v t \ 2 



2a? 



(10) 



in which v r and vt are respectively the radial and tangential 
components of the velocity, s is our free parameter for sym- 
metric deviations (s = identifies the Gaussian case) and 
p(s) is defined so that J fd 3 v — 1: 



p(s) = 



1-KO~t (2(7 J?) 



r(i 



(ii) 



The simple model of eqn (|10p is familiar from con- 
stant anisotropy models f3 = s, where, with the usual 



Figure 5. An illustration of our final parametrization of asym- 
metric deviations. In each of the three panels, which display 
the cases s S {0.5,0,-0.5}, the red profile illustrates the sym- 
metric distribution _£f(s,a = 0;v); black profiles display the 
growth of asymmetries as measured by our parameter a, for 
a e {0.1,0.3,0.7}. 



notation, f} = 1 — of/2<7, 2 . In particular, these three- 
dimensional or intrinsic velocity distributions are the con- 
stant anisotropy phase sp ace distribution functions f or the 
isothermal sphere (see e.g., [Gerhard 1991; Evans 1994). This 
seems a natural starting point for galaxies with flattish ve- 
locity dispersion profiles, for which we might plausibly ex- 
pect the intrinsic velocity distributions to be reasonably sim- 
ilar. 

Our family of symmetric velocity distributions corre- 
sponds to a set of line of sight velocity distributions gener- 
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ated by /. The direction associated with the line of sight 
identifies the velocities and v±. If tp is the angle defined 
by the line of sight and its projection onto the plane of the 
tangential velocity Vt, we have that 

{V r = \v± | COs(a) COs{tp) + W|| sin^) 
vg = |«x| cos(a) s'm(<p) — V\\ cos{tp) . (12) 
v$ = |«j_|sin(a) 

This set of linear transformations allows us to compute the 
line of sight distribution generated by / for any direction tp: 

fios(y\\,(p) = j d 2 tfj_ / [v r (v\\,vj_),v t (v\\, vx)] , (13) 

poo r2iv 

= / dv± / da v ± f(v r ,\v t \) . 
Jo Jo 

Given the properties of the pressure tensor of /, the line of 
sight distribution /i os has a dispersion 

(14) 



(15) 



o~\\{tp) = ov\/l ~~ scos(tp) 2 . 

Hence, the distributions defined by 

&{s,tp;v) = cry {tp) ■ fios [<T\\ {<p) ■ v] 

have by construction zero mean, unit integral and unit dis- 
persion, and are well-suited for our maximum likelihood 
method. 

By reporting explicitly all functional dependences in 
eqn (|15l) , we highlight the fact that the distributions Jzf have 
two parameters, namely the shape s and the angle tp. The 
parameter s is associated with genuine deviations from the 
Gaussian profile, while the effect of varying tp between and 
7r/2 at fixed s is to erase these deviations {tp = tt/2 iden- 
tifies the radial direction, whose line of sight distribution is 
Gaussian for any s). For this reason, s and tp cannot be main- 
tained as independent parameters - since they are strongly 
correlated - and a prescription of the form tp — tp{s) is 
needed as a 'closure'. Different prescriptions introduce small 
differences in the resulting family of distributions, but in this 
paper we adopt 

cos[<p(s)] = 15/(16 - s) (16) 

for two different reasons. First, for positive s, this allows 
us to keep non Gaussianities as strong as possible when s 
approaches its upper limit of unity tp{s = 1) = 0. At the 
same time, we avoid setting tp uniformly to zero, since this 
produces distributions with strongly pronounced peaks also 
for much smaller values of s. Second, the closure condition 
(|16|) allows us, for negative values of s, to provide a range 
of flat topped distributions before a double peaked struc- 
ture appears. Flat topped distributions are almost absent 
if tp is uniformly set to zero. Fig. [4] illustrates the family 
of symmetric distributions jSf we have defined here. Both 
the characteristic extremes of a spiky distribution with sub- 
stantial tails and of a double peaked structure with sharp 
edges can be clearly identified within the displayed range 
—4 ^ s ^ 1. Between such extremes, the entire range of 
intermediate configurations is accessible as well. 

3.2.2 Asymmetric deviations 

Asymmetric distributions can be derived from our symmet- 
ric distributions 5£{s\ v) by the transformation 

£"{s,a;v) = J?{s;X{s,a;v)). (17) 
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Figure 6. A comparison with Gauss-Hermite moments. The up- 
per and lower panels show contours of the Gauss-Hermite mo- 
ments /13 and hi for our two parameter family of distributions. 
To guide the eye, the contours corresponding to and to 0.1 are 
marked, respectively, in red and dashed lines. 



The asymmetric deviations are introduced by the map v 
X(s,a;u). The basic ingredients of the function X enabling 
it to deliver well behaved distributions within the entire pa- 
rameter space (s,a) are described in Appendix lAl Here, we 
only report our choice: 



X ; 



A 1/2 + 3M ) 



l+s/6+|a| 



(18) 

While introducing asymmetries, the application of the trans- 
formation (|18[) to the symmetric distributions S£ also alters 
the normalization, so that in general ££' is no longer a zero- 
mean, unit-integral and unit-dispersion distribution. How- 
ever, it is straightforward to account for these matters and 
we define our final two-parameters family as 



(s, a; v) = — -S?'(s, a; a' 



(19) 



where fi' , a' and I' are respectively the mean, dispersion and 
integral of the distribution Jd" in eqn (|17|) . Fig [S] displays a 
few examples of asymmetric velocity distributions contained 
in our two-parameters family. The different panels illustrate 
the asymmetric deviations caused by positive values of a for 
three different values of s G {0.5, 0,-1}. 



3.2.3 The Gauss-Hermite moments 

To establish a quantitative comparison with the standard 
Gauss-Hermite expansions, we measure the first two nonzero 
moments /13 and hi of our family of distributions. Each of 
them is a function of our two parameters s and a, namely 



h 3 = a/\a\ H 3 {s, \a\) 



hi — Hi{s, \a\) 



(20) 



Contours of H 3 and Hi in the (s, a) plane are displayed 
in the upper and lower panel of Fig [6] In both cases, the 
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Figure 7. Upper panels: Accuracy limits from limited sampling 
with the maximum likelihood method (empty circles). The stan- 
dard deviation of estimated (normalized) mean fi (full line) and 
dispersion <r (dashed line) on the left; the standard deviation of 
the Gauss-Hermite moments /13 (full line) and hi (dashed line) 
on the right. Lower panel: the precision test on the measurements 
obtained through the maximum likelihood method; coding as in 
the upper panels. 
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Figure 8. The average (_S?) (upper panel) and standard deviation 
StD((_Sf}) (lower panel) for different symmetric deviations s. In 
both panels, full lines represent the case (s, a = 0), while dashed 
lines are (s, \a\ = 0.5). 



reference values of and 0.1 are highlighted respectively by 
a red full line and a dashed line to guide the eye. 

There are some aspects worth noting. The amount of 
asymmetric deviation as quantified by h$ is a function of 
both a and s. As is evident from the contours of H3, it is not 
possible to define a one-to-one correspondence a «-» /13. How- 
ever, at least in the vicinity of the Gaussian profile, this is 
almost the case for symmetric deviations. Here, Hi displays 
vertical contours, characteristic of a one-to-one correspon- 
dence s <rf hi. Nonetheless, some deviations are apparent 
for large negative values of s. Notice too that there are two 
distinct countours Hi = 0.1, intersecting the a — axis at 
different positive values of s. Rather than being an issue for 
our family of distributions, this feature is due to the inabil- 
ity of the Gauss-Hermite moment hi to describe large de- 
viations from Gaussianity. Higher Gauss-Hermite moments 
are required to describe these distributions. 

3.3 Tests of Accuracy and Precision 

To evaluate the performance of the maximum likelihood ap- 
proach, we test its accuracy and precision, in a similar man- 
ner to Sect. 12.11 for the Gauss-Hermite series. In order to 
establish a direct comparison, we use as an intrinsic distri- 
bution of the synthetic datasets a perfect Gaussian Ji?(v) = 
Sf(/iint, o"int)- Also, we convert s and a into measurements of 
hz and hi by using the transformations (|20p . All measure- 
ments are obtained by using a Metropolis-Hastings proce- 
dure, which allows us to scan efficiently the 4-dimensional 
parameter space defined by our parametrization. 

Our results are collected in Fig. [7] The upper panels 
display the accuracy test, and are analogous to the panels 
of Fig. [T] The results obtained for the maximum likelihood 
method are denoted by empty circles. It is evident that both 
StD(/i) and StD (a) follow very closely their respective sta- 
tistical prescriptions (1/y/W and l/\/2N , in red). Hence, the 
method achieves the maximum measuring power allowed by 
the sample size. As for the deviations from Gaussianity, both 



StD(/i3) and StD(/u) are substantially smaller than in the 
binned case of Sect. 12.11 Experiments with non-Gaussian 
intrinsic velocity distributions show an even smaller shot 
noise, athough with qualitatively similar figures. The rela- 
tive gain in accuracy for the detections of symmetric devi- 
ations is found to be an increasing function of the sample 
size, reaching approximately 2 for N = 800 and surpassing 
2 for asymmetric deviations. These quantities refer to the 
idealized case of datasets with no observational uncertainty 
(Si — 0) and uniform certainty of membership (p, — 1). 
Hence, they represent only lower bounds for the actual gains 
that are achievable in any real case. 

Lower panels display the precision test, which evaluates 
the reliability of the uncertainties returned by the maximum 
likelihood procedure. The (xt) quantity in the plots repre- 
sents the average (over the number of performed tests) for 
the quantity 

-=(^H 2 ' (2i) 

where 6 stands for any parameter of the family of distri- 
butions and #i nt is its intrinsic, input value. We recall that 
eg denotes the uncertainty on the measured value of the 
parameter as returned by the marginalized likelihoocj]]. We 
find that the errors on \x and a, as well as those on s and a 
behave as desired, with all (Xe) averaging approximately to 
the expected value of unity. 

3.4 A Check on the Degree of Flexibility 

It is natural to raise the question: what if the intrinsic dis- 
tribution is not included in our two-parameter family? This 
may represent the greatest disadvantage of the maximum 

1 These uncertainties are not symmetric in general, so eqn (12 1 1 is 
calculated by using the relevant higher or lower limit of the 68% 
confidence interval, depending on whether the best fitting value 
is larger or smaller than the intrinsic one. 
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Figure 9. A test of our statistical device. Upper panel: the input 
distribution in red and its best description within our family in 
black. Lower panel: evolution with the sample size of the average 
(on the series of tests) of the \ quantity defined in eqn ]26ll . 
Dashed line: the synthetic datasets are distributed according to 
the red input distribution shown in the upper panel; full line: the 
synthetic datasets are distributed according to the black best-fit 
distribution shown in the upper panel. 



likelihood implementation, because for extremely high sam- 
ple sizes and observational precision of kinematic measure- 
ments, the standard Gauss-Hermite expansion can be made 
as flexible as necessary by adding higher order terms. How- 
ever, it is possible to set up an efficient device that controls 
whether the family of distributions is appropriate and flexi- 
ble enough. 

For the Gauss-Hermite series, this device is represented 
by Myller-Lebedeff 's theorem, which equates the integral of 
the residuals between the observed distribution and its best 
Gaussian fit with the sum of the Gauss-Hermite momen ts 
themselves (eqns (12-14) in Ivan der Marel fc Franxl fl993)). 
This allows us to check whether the adopted truncation of 
the Gauss-Hermite series is completely satisfactory, and if 
further higher order terms are required. 

Within our maximum likelihood approach, it is neces- 
sary to ask a more purely statistical question: given the ob- 
served sample V (which comes together with the associated 
uncertainties A, the membership probabilities P, and its 
sample size N) and the distribution Jzf that - within the 
considered family - provides the maximum likelihood, would 
an analogous sample, actually extracted from the same J£ ', 
be fitted significantly better? It is possible to answer this 
question quite easily in an analytic way. 

Let us suppose that is the set of parameters that 
provides the maximum likelihood for the sample V , accom- 
panied by A and P; 



L(Q) = Y[ Pi [jSf * Sf(0, &)] (fi) 



(22) 



The value of L(O) can be compared with the characteristic 
value of the analogous product in eqn (|22[) hi which the 
velocities Vi are actually extracted from the distribution given 



by the set of O: 

N N - 

(L(Q)) = (l[p i Jf*^)=l[ Pi / [J?*^(0,^)] 2 . (23) 

i=l i=l ^ 

Since both quantities defined by eqns (|22[1 and (|23p con- 
verge to zero quickly with N, we find it more convenient 



to consider their nonvanishing counterparts d L(Q) and 



( yL(Q)). In order to ease the notation, we use the sim- 
plification 



L(G)) 



L(Q) = L 



(24) 



If the distance between L and (L) can be accounted for by 
the natural scatter introduced by the sample size N only, 
then the distribution given by the set Q provides a statisti- 
cally perfect description of the sample V . This natural scat- 
ter is clearly given by 



StD [(£}] = y (L(Q) 2 / N ) — (L) 2 
hence the quantity we are interested in is 
X =(Z-(L»/StD[(L>] . 



(25) 



(26) 



Values of \ 2 up to unity indicate that the sample V is sta- 
tistically well described. Negative values of x> with abso- 
lute value significantly larger than unity, indicate that the 
adopted parametrization is not able to provide a good sta- 
tistical description of the sample. 

To apply this criterion, we need explicit expressions 
for both (L) and StD [{V}]. It is useful to note that for a 
fixed set of parameters O, both (L) and StD [{L}] are in- 
variant with respect to a change in fj,, which we can ig- 
nore. Also, if we indicate with {L(a = 1, O s h) 1//JV ) and 

StD \{L(a = 1, 6 sh ) 1/JV )] , the values attained for a = 1 (all 

others 8j, A and P fixed), then for a general a it is easy to 
verify that 



(L(a,§ Eh ) 1/N ) = 
StD [<L(a,e sh ) 1/JV >] = 



{L(a = l,Q sh ) 1/N )/o- (27) 
StD [{L(cr= l,e sh ) 1/JV )] /a , 



where the uncertainties are scaled accordingly, i.e., 6i — > 
Si/a. As a consequence, we can restrict the problem to the 
case cr = l. 

We use now the fact that, for large N, the different 
convolutions can be accounted for by the mean 8 m of the 
sample of uncertainties A, and after some algebra, we find 
the following asymptotic expressions, valid for the case a = 
1: 



StD [<L>] 



rjprflif^to,^)] 1 ^ 

p gm exp(4) + 0(1/N) 



(28) 

, 1/2 



= p gm exp(A) \[B - A 2 /^fN + 0(1/N) (29) 
in which p gm is the geometric mean of the sample's mem- 
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bership probabilities P 



(30) 



and A and B axe the following simple integrals (we implicitly 
assume that 9 = ^(0, Sm/ff)) 



a = A{a = i,e sh ) = 

B = B(a = l,§ sh ) = 



(Jf * «f ) log (J? * Sf) (31) 
(Jzf *Sf)log(«Sf *Sf) 2 . (32) 



Eqns. (|28l - I32|l allow us to compute directly, for any value 
of the parameters 6j , all necessary ingredients to compute \ 
in eqn (|26p . and hence to understand whether the fit of the 
observed sample is indeed statistically good. 

As a reference and an example, we consider here the 
simplified case in which there is no observational uncer- 
tainty, S m — 0, and the likelihood is maximized by the dis- 
tribution «Sf, having dispersion a = 1. With a slight abuse 
of notation, we use (J?) to indicate the average of the like- 
lihood in the sense of eqn (|23p and StD((^)) to indicate its 
standard deviation, as in eqn (|25|l . For the Gaussian case, 
Jzf = ^"(Mi a — l)i both integrals A and B are analytic and 
we find 



(Sf> 
N StD«Sf}) 



0.24197 
0.1711 . 



(33) 



Deviations in J? from the Gaussian profile determine devi- 
ations in the average (Jr?) as well as in the corresponding 
standard deviation StD((«5?}). Fig. [8] displays the behaviour 
of (Jgf) and V / lVStD((jSf» for the cases (s,a = 0) (full line) 
and (s,\a\ = 0.5) (dashed line). Both the displayed quanti- 
ties increase significantly for positive values of s, due to the 
change in shape of the associated distributions. 

Finally, Figure [9] illustrates a practical test. The distri- 
bution in red in the upper panel is used to produce syn- 
thetic datasets of different sample sizes, which are then fed 
to our maximum likelihood formalism. This distribution is 
not included in our parametric family, and, as a comparison, 
the distribution in black in the same panel displays its best 
fit. We perform a large number of tests for different sample 
sizes and record the evolution of the average of the quantity 
X, computed at each test, in the lower panel of the same 
Figure, as a dashed line. For small sample sizes, it is almost 
impossible to distinguish the two distributions. Nonetheless, 
as the sample size increases, the properties of the input dis- 
tribution become more evident and cannot be completely 
reproduced within our family, so that \ reaches a value of 
—2 for TV = 800. The black full line in the lower panel shows, 
for comparison, the average of the x quantity that we obtain 
for synthetic samples that are drawn directly from the best 
fit distribution, and that average to the expected value of 
zero for any sample size. 
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Figure 10. The positional distribution of kinem atic measure- 
ments in Fornax as from the dataset presented by IWalker et al.l 
(2009). The position angle 6ma is displ ayed in red, while the ap- 
parent rotation axis 6 ap p (Piatck et al. 2 0071) is shown in blue. 




Figure 12. Results for the Fornax dSph: angular sectors. Upper 
panel: datapoints display the mean /i; the black full line shows the 
prediction for fi determin ed by the astrome tric measurement of 
proper motion measured jPiatek et al.ll2007f) under the assump- 
tion of no streaming motion; the red full line displays the correc- 
tion introduced by intrinsic rotation. Lower panel: the asymmetric 
deviations. 



4 APPLICATIONS: THE GALACTIC DSPHS 
4.1 Fornax 

The kinematic sample presented by IWalker et all (|2009l ) 
consists of 2409 measurements for stars with a probability 
of membership higher than 0.9. We include this information 



in our likelihood @, and we discard measurements with a 
smaller membership probability. As already found in Sec- 
tion 12.21 this kinematic sample comes with a (normalized) 
level of uncertainty of S m /a ~ 0.22, which is the smallest in 
the currently available selection of dSphs. To construct our 
set of observables, we transform the coordinates of the stars 
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in the plane of the sky with 

Xi = — c cos Si sin(a?i — Qo) (34) 
Hi = c [sin5i cos^o — cos^i sin^o cos(ai — ao)] , (35) 

in which c is a constant (= 10800/7T for coordi- 
nates in arcminutes). We a dopt the coord inates (J2000) 
of Fornax's center as in iMateo I (|l998h . (a ,5 ) = 
(2 h 39 m 59 s ,-34°27.0')- The photometric elhpticity (e = 
0.30) and t he major axis position angle (#ma = 46.8°) are 
taken from (jBattaglia et al.ll2006l ). Fig. 1101 shows the result- 
ing spatial distribution of the kinematic measurements on 
the plane of the sky, with the angle 8ma highlighted in red. 



4-1.1 Symmetric deviations 

Given the number of available kinematic measurements and 
the results of our accuracy tests, we consider a set of different 
subsample sizes for our measurements: N £ {350, 500, 800}. 
We experimented with both circular and elliptical annuli, 
but we have found no significant difference, and hence re- 
port results for the circular annuli only. For a comparison, 
we also consider results for minor axis and a major axis 
regions. Each of these is defined as the sum of the two op- 
posite Cartesian quadrants centered on the relevant axis. 
Given the smaller number of stars in each of these regions, 
only iV 6 {350, 500} were considered. As we demonstrated 
in Sect. 2.3, our measurements of the line profiles in circular 
annuli are not affected by apparent rotation. 

Results for symmetric deviations s and velocity disper- 
sion a are displayed in the upper and lower panels of Fig llll 
In both rows, the left panels (in shades of grey) illustrate 
the results for circular annuli, while middle panels show the 
division according to the major and minor axes regions (re- 
spectively, in shades of blue and red) . Different shades of the 
same colour, and different sizes of the corresponding data- 
point, are used to indicate the sample size used for each mea- 



surement, with larger sizes associated to darker and larger 
datapoints. The upper-right panel translates the symmetric 
deviations in terms of the Gauss Hermite coefficient Yia ■ The 
lower-right panel displays the value of the quantity \ corre- 
sponding to each single maximum likelihood measurement. 

The symmetric deviations display a clear evolution from 
positive values of s (and h&) in the center of Fornax, to neg- 
ative values of s (and Ha) at larger radii, with a tentative 
return towards a more Gaussian profile at the end of the 
sample's radial coverage (~ 3 half-light radii). The major 
and minor axes regions show some differences and the mi- 
nor axis region displays a stronger (negative) signal for s. 
This may perhaps be consistent with an elliptical kinematic 
pattern following the isophotes. Nonetheless it is interesting 
to notice that most of the signal for flat-topped distributions 
is indeed coming from the minor axis region. We confirm a 
mildly declining dispersion profile in Fornax and also note 
a systematic difference between the major axis and minor 
axis regions, with the minor axis showing a lower line of 
sight velocity dispersion. 

If we were to interpret the res ult by following the sug- 
gestio ns of both Gerhard ( 1993]) and lvan der Marel fc Franxl 
i|l993l ). we would conclude that Fornax shows some degree 
of tangential anisotropy, at least outside its central regions. 
We notice that the tentative 'peak' of positive values of 
s (and /14) in the very centre may be interpreted in dif- 
ferent, possibly non exclusive, ways. The first interpreta- 
tion invokes the 'complementarity property', recognized by 
iDeionghel (|l987l ): a tangentially biased structure has flat- 
topped (s < 0) distributions outside some transition radius 
and a more spiky (s < 0) distribution in the central regions. 
The second interpretation is the existence of two populations 
with distinct kinematic properties. It is easy to see that the 
superposition of two approximate Gaussians with different 
widths would be recognized as a distribution with a positive 
s - the exact value of which is dependent on both the ratio of 
numbers and dispersions of the two superposing populations. 
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Finally, there may be place to accommodate a central inter- 
mediate mass black hole (IMBH). In this respect, it would 
be interesting to compare detail ed modelling of our results 
with the constraints obtained bv lJardel fc Gebhardtl (|2012h 
for the IMBH's mass. Regarding the issue of multiple stellar 
populations, we also note a systematic tendency of measure- 
ments in the inner parts of the system to have higher values 
of x- This is consistent with the fact that while at larger 
radii we are effectively modelling just the metal poor stellar 
population, towards the center we register the effects coming 
from two superposing populations. 

We do not report explicit results for the asymmetric de- 
viations a or for the mean \x for circular annuli. Unlike sym- 
metric deviations, asymmetric deviations average to zero 
over circular annuli. Our results confirm this expectation 
and we prefer to address the characterization of asymmet- 
ric deviations by considering a purely angular subdivision of 
the dataset, using angular sectors that make no reference to 
the distance from the centre. 
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Figure 13. The distribution of kinemati c measuremen t s in Sculp- 
tor as from the dataset presented by IWalker et al.l || 200S|) (in 
black) and from the dataset presented by Istarkenburg et all 
l|2010h in green. The position angle $ma is displayed in red. 



4-1.2 Asymmetric deviations and apparent rotation 

In the Fornax dSph, the astrometrically derived proper 
motion dPiatek et all 120071) agrees at approximately 1- 
sigma with the proper motion deduced using the kine- 
matic data under t he assumption of no streaming motions 
l| Walker et alj|2008f ). This testifies to the fact that, if any 
intrinsic rotation is present, it must be small by comparison 
with the velocity field given by the apparent rotation. How- 
ever, a precise measurement for both proper motion and ro- 
tation in dSphs is relevant for comparison with simulations, 
and for constraining the formation history of such systems. 
For this reason, we reconsider this issue here, and note that 
our ability to measure asymmetries in the line of sight pro- 
files can help us constrain the intrinsic rotation field. This is 
because, in the plane of the sky, asymmetries and intrinsic 
rotational velocities are likely to be strongly correlated. 

In the lower panel of Fig. 1121 we measure the the " asym- 
metry field" in angular sectors a{6) around Fornax's center 
(on subsamples N = 400). Although almost everyhere nearly 
zero, we do detect a 2-7r-periodic signal, which is indeed com- 
patible with an intrinsic rotation. Not all datapoints in the 
panel are independent, and hence we do not try to fit our 
result, but it is encouraging that the peaks of the signal are 
approximately aligned with the major axis of the system, 
which is displayed as red vertical lines, # poa k ~ #ma- This 
signal is not due to apparent rotation, and is robust against 
subtraction of the apparent rotation field. 

The datapoints in the upper panel of Fig. [12] show 
the associated mean in angular sectors (on subsam- 

ples N=350). This can be compared with the black full line 
in the same panel, which displays the apparent rotation that 
the astrometrically derived proper motion implies on the 2- 
dimensional distribution of kinematic measurements. It is 
clear that any disagreement is again correlated with the po- 
sition of the major axis, and has opposite signs in oppo- 
site directions, in a way that is compatible with intrinsic 
rotation. Unfortunately, despite the quality of the dataset, 
it is not possible to derive a statistically meaningful char- 
acterization of the two dimensional intrinsic velocity field. 
This implies, approximately, an intrinsic rotation of about 
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Figure 15. Results for the Sculptor dSph: angular sectors. Upper 
panel: datapoints display the mean /i; full lines show the predic- 
tion for the mean fi determi ned by the astrome tric measurement 
of proper motion measured l|Piatek et al 1 l2006fl and no intrinsic 
rotation (systematic velocities as from the datasamples). Lower 
panel: the asymmetric deviations. 



1 kms 1 for the outermost tracers aligned with the major 
axis in either directions. 



4.2 Sculptor 

The available sample from IWalker et al.l (|2009T ) contains 
1370 line of sight velocity measurements with member- 
ship probability higher than 0.5. We ad opt the coordi- 
nates (J2000) of the dSph's centre from iMateo I ljl99ct l: 
(a ,S ) = (l' , 00 m 09 s ,-33 o 42.5')- The photometric elliptic- 
ity e = 0.26 ± 0.01 and position angle 6> M a = -85.3° ± 0.9° 
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Figure 14. Results for the Sculptor dSph: circular annuli. First row: the symmetric deviations. Second row, left and middle panels: the 
dispersion. Second row, right panel: the quantity \ f° r an displayed maximum likelihood measurements. 



are taken from Ide Boer et al.l l|201ll 'l. As reported in Sec- 
tion 12.21 this kinematic sample comes with a (normalized) 
level of uncertainty of S m /a « 0.325. We accompany this 
kinem atic sample with the one p rovided by [ Battaglia et al.l 
|2008t ) and then recalibrated by IStarkenburg et all (|201of ) 
(results related with this datasets are displayed in green in 
all relevant Figures). To avoid misalignment between the 
catalogs, for t his second dat a set w e use the dSph center as 
determined in Ide Boer et ail (|201ll ). Even though the num- 
ber of line of sight velocity measurements is lower with 629 
giants, they cover a significantly more extended radial re- 
gion (see Fig I13|) . which makes the two datasets comple- 
mentary. Also, a smaller (normalized) level of uncertainty 
S m /a w 0.17 is achieved. 

Given the reduced number of kinematic tracers in com- 
parison to Fornax, we are forced to consider smaller sam- 
ple sizes: N 6 {350, 500} for the circular annuli and N £ 
{200, 350} for the major axis and minor axis regions. Re- 
sults for symmetric deviations s and velocity dispersion a 
are displayed respectively in the upper and lower panels of 
Fig 1141 The collected results refer to circular annuli, and 
again no significant differences were found for the case of 
elliptical annuli. 

The symmetric deviations display a marked preference 
for positive values of s (and hi) for the entire radial range 
covered by the tracers. This behaviour is confirmed by both 
datasets, which are found in perfect agreement. Identical 
profiles (within the uncertainties) are found for the major 
axis and minor axis regions. Taken at face value, these re- 
sults support a radi ally biased dynamical stru cture, which 
was also the result of lAmorisco fc Evans! |2012T i . The central 
peak in s as well as the tendency for higher values of \ to- 
wards the center mimic the case of Fornax, and hence point 
towards the effect of superposing populations, even though 
we cannot exclude other dynamical origins. 

We confirm the outwardly increasing velocity dispersion 
profile in Sculptor, although it should be remembered that 



the dataset from Battagl ia et al.l j2008) is not provided with 
probabilities of membership, hence the outermost points 
may probably be affected by contamination. Nonetheless, 
in the radial range where both datasets are available, they 
agree very well. Surprisingly, we find that the two datasets 
do not agree in the deduced means /i. The upper panel of 
Fig. [15] displays the mean in angular sectors fi(8). We detect 
very similar angular behaviour, but the two datasets seem 
to be shifted uniformly of about 1 kms -1 . Given that higher 
order moments all agree, we suspect that such a significant 
difference may be systematic in origin. Therefore, particular 
caution should be used when attempting to merge the two 
datasets. 

Unfortunately, neither of the two datasets displays a 
conclusive signal for the asymmetric deviations (see lower 
panel in Fig. I15|) . which does not allow us to make progress 
in the determination of Sculptor's proper motion or intrinsic 
rotation. It is known tha t the astrometric m easurement of 
Sculptor's proper motion IjPiatek et al.l l2006) does not agree 
with the kinematic rotation signal. This suggest either the 
presence of significant intrinsic rotation or perhaps an error 
in the astrometric measurement, and is exemplified by the 
disagreement between the datapoints and the full curves in 
the upper panel of Fig. 1151 Such curves, display the apparent 
rotation implied by Piatek's proper motion, and have been 
normalized to the systematic velocity derived separately by 
each dataset. In turn, even though roughly agreeing on its 
direction and qualitat ively with the rotation identified in 
iBattaglia et al.l (|200ct ). the two kinematic samples would 
suggest proper motions of different magnitudes, hence fur- 
ther observational effort will be necessary for progress. 

4.3 Carina and Sextans 

The sizes of the kinematic samples regarding the Sextans 
and Carina dSphs are significantly smaller than the two pre- 
vious cases, respectively with 449 and 780 stars with mem- 
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Figure 16. Results for the Carina (upper panels) and the Sextans (lower panels) dSphs. Left and middle panels in both rows: results 
for the asymmetric deviations. Right panels: velocity dispersion. 



bership probability higher than 0.5 (424 and 758 higher than 
0.9). Also, since the intrinsic line of sight velocity dispersion 
is smaller than in Fornax and Sculptor, the average level 
of uncertainty goes up, as reported in eqns ([3]). Since the 
apparent velocity fields are poorly constrained, we decide 
to avoid their subtraction. For completeness, we use the co- 
ordina tes of the centers of these dSphs as listed in iMateo I 
(1 19981 ). We use, respectively for Sextans and Carina, sample 
sizes of N = 200 and N € {175, 325}. 

Results for the symmetric deviations in circular annuli 
for both dSphs are shown in Fig. 1161 Both systems show 
a tendency (outside the innermost regions in the case of 
Carina) for line of sight distributions that are more peaked 
than Gaussian, which is compatible with a radial bias in 
their orbital structure. 



5 CONCLUSIONS 

We have devised an efficient method to extract the shape 
information for line profiles of discrete kinematic data. Such 
information is often crucial in constraining the orbital struc- 
ture of stellar systems. Independent knowledge of the orbital 
anisotropy is necessary to break the mass-anisotropy degen- 
eracy, and hence to constrain the mass density profile. Clear- 
cut determination of the mass profile at both very small and 
large radii is made challenging by such degeneracies, but 
nonetheless provides a crucial test for our picture of galaxy 
formation. Also, the orbital structure retains memory of the 
initial conditions in which the tracers were formed, and so 
constrains albeit indirectly the different galaxy formation 
mechanisms. 

Our methods are complementary to the standard 
Gauss-Hermite formalism, that is best suited for continuous 
data obtained from absorption line spectra. Discrete kine- 
matic measurements are affected by inhomogeneous uncer- 
tainties and often come with varied probabilities of mem- 



bership. The Gauss-Hermite formalism is unable to account 
for all this different information, in contrast to a Bayesian 
approach, which also allows us to avoid any binning proce- 
dure. 

Since the Gauss-Hermite series is not positive definite, 
it cannot be used as a probability distributions. Instead, we 
construct a new family of line profiles derived from veloc- 
ity distributions and use them in the context of Bayesian 
inference. Our family has two parameters, namely s which 
quantifies symmetric deviations from the Gaussian profile 
and a which refers to the asymmetric deviations. The pa- 
rameter s has a kinematic interpretation, and is associated 
with line profiles that mimic those of constant anisotropy 
models (with an exponential dependence on the energy). 

Our methods allow us to measure directly the intrinsic 
line profiles Jzf , rather than the profile convolved with the 
observational uncertainties. The advantage of such an ap- 
proach is substantial. Any signal of a deviation from Gaus- 
sianity is significantly stronger in the intrinsic line profile. 
Hence, a smaller sample size is sufficient to reach the level 
at which signal itself is larger than the shot noise. We quan- 
tify the magnitude of this noise as a function of the sam- 
ple size and find that, within the Gauss-Hermite formalism, 
this noise is higher than the expected signal in both h% and 
hi for sample sizes smaller than N~200. This casts doubts 
on measurements obtained on significantly smaller sample 
sizes, especially if important observational uncertainties are 
present. We find that our maximum likelihood methods per- 
form in comparison systematically better. Even in the case 
of zero uncertainties, we achieve a relative gain in accuracy 
that is about 2 on hi (for sample sizes N = 800) and higher 
for /13. These quantities cannot but improve in presence of 
observational uncertainties and estimates for the probabili- 
ties of membership. 

To ensure that our methods can give reliable descrip- 
tions of the shape, we present a simple test that is able to 
assess the statistical quality of the fit. This is obtained by 
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quantifying the scatter that limited sampling implies on the 
average value of the likelihood, which can be done analyti- 
cally. We apply this test to a practical example and confirm 
that it is indeed able to identify cases in which the adopted 
family of line profile is not able to provide a good statistical 
representation of the data. 

We apply our formalism to the discrete velocity datasets 
of the dwarf spheroidals of the Milky Way. We quantify the 
effects of apparent rotation due to systematic proper mo- 
tions and find that these are not an issue for the dSphs. 
We measure detailed radial profiles for the symmetric de- 
viations in Fornax, Sculptor, Carina and Sextans. All sys- 
tems but Fornax are characterized by line of sight profiles 
that are substantially more peaked than Gaussian o utside 
the centre. If interpreted followin g both iGerhardl l| 19931 ) 
and Ivan der Marel k, Franxl |l993), this suggests a radially 
biased orbital structure in Sculptor, Carina and Sextans. De- 
tailed dynamical modelling is required in order to quantify 
the orbital structure, and to assess the effects of the stellar 
density distribution as well as those of the unknown inclina- 
tion. Nonetheless, on a qualitative level, the sharply falling 
photometric profile of the dSphs assures us that a signifi- 
cantly peaked velocity dispersion can be robustly associated 
with a radial bias of the orbits. On the other hand, Fornax, 
shows line profiles that are flat-topped at large radii, hence 
perhaps favouring some tangential anisotropy. This suggests 
that it may have had a different recent accretion history to 
the other dSphs. Support for this view point is also provided 
by its distinctive shel l structures (e.g.. | Coleman et alj 120051 ; 
lOlszewski et alJl200d : IColeman fc de Jondl200^1 ~ 

We also consider the angular behaviour of the asym- 
metric deviations from Gaussianity. In Fornax, we find a 
systematic angular trend, that we interpret as originating in 
a small level of intrinsic rotation. In fact, we find that this 
trend is mirrored in systematic residuals in the mean velocity 
with respect to the astrometrically determined proper mo- 
tion. This is consistent with a mild intrinsic rotation about 
the minor axis, reaching about 1 kms -1 in the radial range 
covered by the kinematic sample. 

Our methods for characterizing the shapes of line pro- 
files in discrete kinematic datasets are both powerful and 
adaptable. In recent times, the size and variety of such 
datasets has increased substantially, with applications that 
range from the kinematics of the globular cluster popula- 
tions in relatively distant massive galaxies to precision kine- 
matics of giant stars in our own Galactic neighborhood. We 
anticipate that our methods will find ready application to 
a rich variety of datasets, and are actively pursuing further 
applications. 
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Figure Al. The construction of asymmetric distributions. The 
upper panel displays the comparison between the identity func- 
tion X(s, a = 0; v) (in red) and two analogous functions (with 
s = 0) obtained for nonzero values of a (respectively a = 0.3 as 
a dashed line and a = 0.6 as a dashed line). The lower panel 
illustrates, with the same graphical coding, the associated distri- 
butions Jz?', constructed as in eqn (1171) . 
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APPENDIX A: CONSTRUCTION OF 
ASYMMETRIC DEVIATIONS 

We considered several alternatives for the implementation of 
asymmetric deviations starting with a simple parametriza- 
tion compatible with the general form 

f = fe{s)(l-af (s,a)), (Al) 

where f a and f a are respectively an even and odd function 
of one of the components of the tangential velocity Vt, for 
example vg, and a is the parameter for asymmetric devia- 
tions. This approach has not been successful for at least two 
reasons. 

First, the asymmetric deviations produced by the func- 
tional form (|A1|) - which has to satisfy the consistency re- 
quirement / - are too small for our purposes. It is 
importnat to have a large template of deviations during 
the measuring procedure. Depending on the sample size and 
given the natural accuracy limits we quantified in Sect 12.11 
even in the case of an intrinsically symmetric distribution, 
distributions with a large asymmetry (/13 ^ 0.1) are needed 
in order to assess a reliable errorbar. Second, the functional 
form (|A1|I has the significant limit of correlating symmet- 
ric and asymmetric deviations. At fixed s, fi os {v\\ = 0) is 
in fact invariant with respect to a, anh hence, asymmetric 
deviations come together with a more spiky profile, which is 
not a desirable feature. 

Our implementation of asymmetries, presented in 



eqn. (|17[) is able to overcome both difficulties. Fig. IA1I il- 
lustrates the main ingredients of this approach. The upper 
panel displays the comparison between the identity function 
in red, representing the case X(s, a = 0;v), and two analo- 
gous functions obtained for s = but nonzero values of a. In 
the lower panel we display the associated distributions .if' , 
in comparison with the Gaussian profile in red. Our function 
X in eqn (|18[) is constructed in order to comply with a series 
of requirements. 

• X is asymptotic to 11 at both negative and positive ex- 
tremes of the real axis. This is achieved by the structure 
X — v ~ exp(t> 4 ). 

• The choice of the fourth power (rather than the second, 
for example) assures that asymmetric distributions are not 
significantly spikier than the associated symmetric distribu- 
tion: X and v are almost parallel when X « 0. 

• The dependences on a in the exponential term are re- 
quired so to adapt the magnitude of the deviations of X 
from v to the size of the interval (as well as its position) 
where these deviations need to affect .if'. 

• X has to cross the identity function in order to guaran- 
tee a shallow decline on one of the wings of the asymmetric 
distribution, that in turn crosses the associated symmetric 
distribution. This is obtained by the linear term that multi- 
plies the exponential one in X. 

• This crossing point is adjusted to both the shape of the 
symmetric distribution and to the magnitude of the asym- 
metric deviations by the dependences on s and a in the 
mentioned linear term. 

• Finally, the multiplication by a/\a\ ensures that the 
magnitude and shape of asymmetric deviations are identical 
(other than in direction) for positive and negative values of 
a. 
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